char *re_comp(pat) char *pat; re_exec(str) char *str; re_subs(src, dst) char *src; char *dst; void re_fail(msg, op) char *msg; char op; void re_modw(str) char *str;
Re_comp compiles a pattern string into an internal form (a deterministic finite-state automaton) to be executed by re_exec for pattern matching. Re_comp returns zero if the pattern is compiled successfully, otherwise it returns an error message string. If re_comp is called with a null pointer or a pointer to an empty string, it returns without changing the currently compiled regular expression.
Re_comp supports the same limited set of regular expressions found in ed and Berkeley regex(3) routines:
[1] char Matches itself, unless it is a special
character (meta-character): . \ [ ] * + ^ $
[2] . Matches any character.
[3] \ Matches the character following it, except
when followed by a digit, (, ), < or >
(see [7], [8] and [9]).
It is used as an escape character for all other meta-characters, and itself.
When used in a set ([4]), it is treated as an ordinary character.
[4] [set] Matches one of the characters in the set.
If the first character in the set is ^, it matches a character not
in the set.
the shorthand
S-E
specifies the set of characters
S
through
E,
inclusive.
The special characters ] and - have no special meaning if they
appear as the first characters in the set.
Examples Match [a-z] any lowercase alpha [^]-] any char except ] and - [^A-Z] any char except uppercase alpha [a-zA-Z0-9] any alphanumeric
[5] * Any regular expression form [1] to [4], followed by the
closure character (*) matches zero or more matches of that form.
[6] + Same as [5], except it matches one or more.
[7] \( \) A regular expression in the form [1] to [10], enclosed
as \(form\) matches what form matches.
The enclosure creates a set of tags, used for [8] and for pattern
substitution in
re_subs.
The tagged forms are numbered starting from one.
[8] \d A \ followed by a digit matches whatever a previously tagged
regular expression ([7]) matched.
[9] \< Matches the beginning of a word; that is,
an empty string followed by a letter, digit, or _ and not preceded by a
letter, digit, or _ .
\> Matches the end of a word; that is, an empty
string preceded by a letter, digit, or _ , and not followed by a letter,
digit, or _ .
[10] A composite regular expression xy where x and
y are in the form of [1] to [10] matches the longest match of x
followed by a match for y.
[11] ^ $ A regular expression starting with a ^ character
and/or ending with a $ character, restricts the pattern matching to the
beginning of the line, and/or the end of line (anchors).
Elsewhere in the pattern, ^ and $ are treated as ordinary
characters.
Re_exec executes the internal form produced by re_comp and searches the argument string for the regular expression described by the internal form. Re_exec returns 1 if the last regular expression pattern is matched within the string, 0 if no match is found. In case of an internal error (corrupted internal form), re_exec calls the user-supplied re_fail and returns 0.
The strings passed to both re_comp and re_exec may have trailing or embedded newline characters, but must be properly terminated with a NUL.
Re_subs does ed-style pattern substitution, after a successful match is found by re_exec. The source string parameter to re_subs is copied to the destination string with the following interpretation:
[1] & Substitute the entire matched string in the destination.
[2] \d Substitute the substring matched by a tagged subpattern
numbered d, where d is between 1 and 9, inclusive.
[3] \c Treat the next character literally, unless the
character is a digit ([2]).
If the copy operation with the substitutions is successful, re_subs returns 1. If the source string is corrupted, or the last call to re_exec fails, it returns 0.
Re_modw is used to add new characters into an internal table to change the re_exec's understanding of what a word should look like, when matching with \< and \> constructs. If the string parameter is 0 or null string, the table is reset back to the default, which contains A-Z a-z 0-9 _ .
Re_fail is a user-supplied routine to handle internal errors. Re_exec calls re_fail with an error message string, and the opcode character that caused the error. The default re_fail routine simply prints the message and the opcode character to the standard error and calls exit(2).
foo*.* fo foo fooo foobar fobar foxx ... fo[ob]a[rz] fobar fooar fobaz fooaz foo\\+ foo\ foo\\ foo\\\ ... \(foo\)[1-3]\1 foo1foo foo2foo foo3foo (This is the same as foo[1-3]foo, but it takes less internal space.) \(fo.*\)-\1 foo-foo fo-fo fob-fob foobar-foobar ...
No previous regular expression Empty closure Illegal closure Cyclical reference Undetermined reference Unmatched \( Missing ] Null pattern inside \(\) Null pattern inside \<\> Too many \(\) pairs Unmatched \)
Software tools, Kernighan & Plauger. Software tools in Pascal, Kernighan & Plauger. Grep sources [rsx-11 C dist], David Conroy. Ed - text editor, Unix Programmer's Manual. Advanced editing on Unix, B. W. Kernighan. RegExp sources, Henry Spencer.
Re_comp and re_exec generally perform at least as well as their licensed counterparts. In a very few instances, they are about 10% to 15% slower.